General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian

The combination of deep learning and ab initio calculation has shown great promise in revolutionizing future scientific research, but how to design neural network models incorporating a priori knowledge and symmetry requirements is a key challenging subject. Here we propose an E(3)-equivariant deep-learning framework to represent density functional theory (DFT) Hamiltonian as a function of material structure, which can naturally preserve the Euclidean symmetry even in the presence of spin–orbit coupling. Our DeepH-E3 method enables efficient electronic structure calculation at ab initio accuracy by learning from DFT data of small-sized structures, making the routine study of large-scale supercells (>104 atoms) feasible. The method can reach sub-meV prediction accuracy at high training efficiency, showing state-of-the-art performance in our experiments. The work is not only of general significance to deep-learning method development but also creates opportunities for materials research, such as building a Moiré-twisted material database.

The combination of deep learning and ab initio calculation has shown great promise in revolutionizing future scientific research, but how to design neural network models incorporating a priori knowledge and symmetry requirements is a key challenging subject. Here we propose an E(3)-equivariant deeplearning framework to represent density functional theory (DFT) Hamiltonian as a function of material structure, which can naturally preserve the Euclidean symmetry even in the presence of spin-orbit coupling. Our DeepH-E3 method enables efficient electronic structure calculation at ab initio accuracy by learning from DFT data of small-sized structures, making the routine study of large-scale supercells (>10 4 atoms) feasible. The method can reach sub-meV prediction accuracy at high training efficiency, showing state-of-the-art performance in our experiments. The work is not only of general significance to deep-learning method development but also creates opportunities for materials research, such as building a Moiré-twisted material database.
It has been well recognized that deep-learning methods could offer a potential solution to the accuracy-efficiency dilemma of ab initio material calculations. Deep-learning potential 1,2 and a series of other neural network models [3][4][5][6][7] are capable of predicting the total energies and atomic forces of given material structures, enabling molecular dynamics simulation at large length and time scales. The paradigm has been used for deep-learning research of various kinds of physical and chemical properties [8][9][10][11][12][13][14][15][16][17][18][19] . During the development of these methods, people have gradually come to realize that the introduction of symmetry considerations as a priori knowledge into neural networks is of crucial importance to the deep-learning approaches. For this purpose, people have drawn insights from a class of neural networks called the equivariant neural networks (ENNs) [20][21][22][23][24] . The key innovation of ENNs is that all the internal features transform under the same symmetry group with the input; thus, the symmetry requirements are explicitly treated and exactly satisfied. Symmetry has fundamental importance in physics, so ENNs will be especially advantageous when they are applied to the modeling of physical systems, as shown by a series of neural network models for various material properties 6,7,[13][14][15] .
Recently, a deep neural network representation of density functional theory (DFT) Hamiltonian (named DeepH) was developed by employing the locality of electronic matter, localized basis, and local coordinate transformation 25 . By the DeepH approach, the computationally demanding self-consistent field iterations could be bypassed, and all the electron-related physical quantities in the single-particle picture can, in principle, be efficiently derived. This opens opportunities for the electronic structure calculation of large-scale material systems. However, it is highly nontrivial to incorporate symmetry considerations into DeepH. Specifically, the property that the Hamiltonian matrix changes covariantly (i.e., equivariantly) under rotations or gauge transformations should be preserved by the neural network model for efficient learning and accurate prediction (Fig. 1). A strategy is developed in DeepH to apply local coordinate transformation which changes the rotation covariant problem into an invariant one and thus  25 . Nevertheless, the large amount of local coordinate information seriously increases the computational load, and the model performance depends critically on a proper selection of local coordinates, which relies on human intuition and is not easy to optimize. Therefore, we think that the combination of DeepH 28 . However, the key capability of DeepH that learns from DFT results on small-sized material systems and predicts the electronic structures of much larger ones has not been demonstrated by these methods. More critically, the existing equivariant methods have neglected the equivariance in the spin degrees of freedom, although the electronic spin and spin-orbit coupling (SOC) play a key role in modern condensed matter physics and materials science. With SOC, one should take care of the spin-orbital Hamiltonian, whose spin and orbital degrees of freedom are coupled and transform together under a change of coordinate system or basis set, as illustrated in Fig. 1. This would raise critical difficulties in designing ENN models due to a fundamental change of symmetry group. In this context, the incorporation of ENN models into DeepH is essential but remains elusive.
In this work, we propose DeepH-E3, a universal E(3)-equivariant deep-learning framework to represent the spin-orbital DFT Hamilto-nianĤ DFT as a function of atomic structure fRg by neural networks, which enables efficient electronic structure calculations of large-scale materials at ab initio accuracy. A general theoretical basis is developed to explicitly incorporate covariance transformation requirements of fRg7 !Ĥ DFT into neural network models that can properly take the electronic spin and SOC into account, and a code implementation of DeepH-E3 based on the message-passing neural network is also presented. Since the principle of covariance is automatically satisfied, efficient learning and accurate prediction become feasible via the DeepH-E3 method. Our systematic experiments demonstrate the stateof-the-art performance of DeepH-E3, which shows sub-meV accuracy in predicting DFT Hamiltonian. The method works well for various kinds of material systems, such as magic-angle twisted bilayer graphene or twisted van der Waals materials in general, and the computational costs are reduced by several orders of magnitude compared to direct DFT calculations. Benefiting from the high efficiency and accuracy as well as the good transferability, there could be promising applications of DeepH-E3 in electronic structure calculations. Also, we expect that the proposed neural network framework can be generally applied to develop deep-learning ab initio methods and that the interdisciplinary developments would eventually revolutionize future materials research.

Realization of equivariance
It has long been established as one of the fundamental principles of physics that all physical quantities must transform equivariantly between reference frames. Formally, a mapping f : X → Y is equivariant for vector spaces X and Y with respect to group G if D Y (g) ∘ f = f ∘ D X (g), ∀g ∈ G, where D X , D Y are representations of group G over vector spaces X, Y, respectively. The problem considered in this work is the equivariance of a mapping from the material structure fRg including atom types and positions to the DFT HamiltonianĤ DFT with respect to the E(3) group. The E(3) group is the Euclidean group in threedimensional (3D) space which contains translations, rotations, and inversion. Translation symmetry is manifest since we only work on the relative positions between atoms, not their absolute positions. Rotations of coordinates introduce nontrivial transformations, which should be carefully investigated. Suppose the same point in space is specified in two coordinate systems by r and r 0 . If the coordinate systems are related to each other by a rotation, the transformation rule between the coordinates of the point is r 0 = Rr, where R is a 3 × 3 orthogonal matrix.
In order to take advantage of the nearsightedness of electronic matter 29 , the Hamiltonian operator is expressed in the picture of localized pseudo-atomic orbital (PAO) basis. The basis is separated into radial and angular parts, having the form ϕ iα ðrÞ = R ipl ðrÞY lm ðrÞ.
Here i is the site index, α ≡ (plm), where p is the multiplicity index, Y lm is the spherical harmonics having angular momentum quantum number l and magnetic quantum number m, r ≡ |r − r i | andr ðr À r i Þ=|r À r i | where r i is the position of the ith atom. The transformation rule for the Hamiltonian matrix between the two coordinate systems described above is  where D l mm 0 ðRÞ is the Wigner D-matrix. The equivariance of the mapping fRg 7 !Ĥ DFT requires that, if the change of coordinates causes the positions of the atoms to transform, the corresponding Hamiltonian matrix must transform covariantly according to Eq. (1).
ENN is applied to construct the mapping fRg 7 !Ĥ DFT in order to preserve equivariance. The input, output, and internal feature vectors of ENNs all belong to a special set of vectors that have the form x l = (x l,l , …, x l,−l ) and transform according to the following rule: This vector is said to carry the irreducible representation of the SO(3) group of dimension 2l + 1. If the input vectors are transformed according to Eq. (2), then all the internal features and the output vectors of the ENN will also be transformed accordingly. Under this constraint, the ENN incorporates learnable parameters in order to model equivariant relationships between inputs and outputs.
The method of constructing the equivariant mapping fRg 7 !Ĥ DFT is illustrated in Fig. 2. The atomic numbers Z i and interatomic distances |r ij | ≡ |r i − r j | are used to construct the l = 0 input vectors (scalars). Spherical harmonics acting on the unit vectors of relative positionsr ij constitute input vectors of l = 1, 2, … . The output vectors of the ENN are passed through the Wigner-Eckart layer before representing the final Hamiltonian. This layer exploits the essential concept of the Wigner-Eckart theorem: " ⊕ " and " ⊗ " signs stand for direct sum and tensor product of representations, respectively. "=" denotes equivalence of representations, i.e., they differ from each other by a change of basis. The coefficients in the change of basis are exactly the celebrated Clebsch-Gordan coefficients. The representation l 1 ⊗ l 2 is carried by the tensor x l 1 l 2 , which transforms according to the rule Notice that Eq. (4) has the same form as Eq. (1), so the tensor x l 1 l 2 can exactly represent the output Hamiltonian satisfying the equivariant requirements.

Equivariance of the spin-orbital Hamiltonian
If we further consider the spin degrees of freedom, the transformation rule for the Hamiltonian becomes where σ 1 , σ 2 are the spin indices (spin up or down). The construction of the spin-orbital DFT Hamiltonian is a far more complicated issue. Electron spin has angular momentum l = 1/2, so it seems that tedious coding and debugging are unavoidable because we have to introduce complex-valued half-integer representations into the neural network, which typically only supports real-valued integer representations for the time being. Furthermore, a 2π rotation brings a vector in 3D space to itself but introduces a factor -1 to the spin-1/2 vector. This means that any mapping from 3D input vectors to l = 1/2 output vectors will be discontinued and cannot be modeled by neural networks, which poses a serious threat to our approach since we only have 3D vectors as input to the neural network ( Fig. 2). Fortunately, we observe that l = 1/2 appearing in the DFT Hamiltonian does not necessarily mean that half-integer representations  must be inserted everywhere into the neural network. In fact, they can be restricted to the final output layer as soon as we employ the transformation rule: There is no half-integer representation on the right-hand side; thus, it can be further decomposed into integer representations by repeatedly applying Eq. (3).
Another problem is associated with the introduction of complex numbers. Generally, the spin-orbital Hamiltonian matrix elements have complex values, and the ENN cannot simply predict its real and imaginary parts separately because this will violate equivariance. Ordinary neural networks of complex numbers are mostly still under their experimental and developmental stage, so the use of complexvalued ENN is practically difficult, if not impossible. Nevertheless, we have discovered a way to sidestep this problem. Under the bases which are eigenvectors of the time-reversal operator, the D-matrices of integer l will become purely real. Consequently, for a vector with integer l under that basis, its complex and real parts will never mingle with each other when the vector is multiplied by a real transformation matrix. Then one complex vector can be technically treated as two real vectors while preserving equivariance. Note that this is not true for half-integer representations, for that we must add up the real and imaginary parts before the integer representations are converted to half-integer representations in the Wigner-Eckart layer (Fig. 2).
Yet another subtle issue arises in Eq. (5). It is not exactly the same as Eq. (4) in that two of the D-matrices in the former equation are taken as complex conjugates, but those in the latter are not. In fact, instead of constructing a vector with representation ðl 1 1 2 Þ ðl 2 1 2 Þ, we must construct ðl 1 1 2 Þ ðl * 2 1 2 * Þ to represent the spin-orbital Hamiltonian described in Eq. (5). Here, l* denotes the representation whose representation matrix is replaced by its complex conjugate. This is not a problem for integer l, but is critical for l = 1/2. If not treated properly, the overall equivariance will be violated. In order to solve this problem, we first notice that the representation l* is still a representation of the SU(2) group with dimension 2l + 1. In fact, it is guaranteed to be equivalent to the representation l without complex conjugate. In other words, there must exist a unitary matrix P l for each integer or halfinteger l satisfying This is guaranteed by the fact that the quantum rotation operatorÛðgÞ commutes with the time-reversal operator T : hlm|ÛðgÞ|lm 0 i = hlm|T yÛ ðgÞT |lm 0 i = ðÀ1Þ mÀm 0 hl, À m|ÛðgÞ|l, À m 0 i * . The matrix P in Eq. (7) is thus given by Therefore, we only need to apply a change of basis to convert a vector carrying representation l to a vector carrying l*. Notice that this property holds even for material systems without time-reversal symmetry.
The workflow of constructing the DFT Hamiltonian is summarized and illustrated in Fig. 2. In order to construct a Hamiltonian with SOC, the output vectors from the ENN are first separated into two real components, then combined together into complex vectors and passed to the Wigner-Eckart layer. The Wigner-Eckart layer uses the rules in Eq. (3) and Eq. (6) to convert these vectors to tensors of the form in Eq. (4), except that the tensors here have rank 4 instead of 2. After that, the last spin index is converted to its complex conjugate counterpart by the change of basis using Eq. (8) for l = 1/2. The output tensors follow the same transformation rule under coordinate rotation as the DFT Hamiltonian in Eq. (5), and thus could be used to represent the DFT Hamiltonian matrix.
Finally, we discuss two remaining issues. To include parity, we will consider E(3) = SE(3) ⊗ {E, I}, where E is the identity and I is the spatial inversion. Under a coordinate transform, the vector is multiplied by −1 if it has odd parity and the coordinate transform involves spatial inversion. The parity of the Hamiltonian is determined by ðÀ1Þ l 1 + l 2 . In addition, there is a possible ambiguity in Eq. (5) since the mapping from a classical rotation R to a quantum rotation D 1 2 is not singlevalued. However, the possible factor −1 will always be canceled between the two D-matrices in that equation, which eliminates the potential problem.

The neural network architecture of DeepH-E3
Here we present the neural network architecture of the DeepH-E3 method. An illustration of the architecture can be found in Fig. 3. The general structure is based on the message-passing neural network 9,30 that has been widely used in materials research 6,7,[14][15][16][17][18][19]25 . The material structure is represented by a graph, where each atom is associated with a vertex (or node). Edges are connected between atom pairs with nonzero inter-site hopping, and self-loop edges are included to describe intra-site coupling. Every vertex i is associated with a feature v i and every edge ij with e ij . These features are composed of several  vectors defined in Eq. (2). As illustrated in Fig. 3a, the initial feature v ð0Þ i of vertex i is the trainable embedding of the atomic number Z i , and the initial e ij is the interatomic distance |r ij | expanded using the Gaussian basis e B (|r ij |) as defined in Eq. (11). The features of vertices and edges are iteratively updated using features of their neighborhood as incoming messages. Finally, the final edge feature e ij is passed through a linear layer and used to construct the Hamiltonian matrix block H ij between atoms i and j using the method illustrated in Fig. 2. It is worth mentioning that, under the message-passing scheme, the output Hamiltonian is only influenced by the information of its neighborhood environment. The nearsightedness property 29 ensures efficient linearscaling calculations as well as good generalization ability 25 .
The equivariant building blocks of the neural network are implemented using the scheme provided by Tensor-Field Networks 21 and e3nn 24,31 . The feature vectors x ðlÞ cm processed by these neural network blocks are implemented as dictionaries with key l, an integer which is the order of representation of the SO(3) group. c is the "channel index" ranging from 1 to n (l) , where n (l) is the number of channels at order l, and each channel refers to a vector defined in Eq. (2).
The E3Linear layer defined in Eq. (12) possesses learnable weights and biases, which is similar to linear layers in conventional neural networks, but only connects vectors of the same representation to preserve equivariance. The gate layer introduces equivariant nonlinearity, as proposed in ref. 22, where nonlinearly activated l = 0 vectors (i.e., scalars) are used as scaling factors ("gates") to the norms of l ≠ 0 vectors.
We propose a normalization scheme, E3LayerNorm, that normalizes the feature vectors using mean and variance obtained from the layer statistics while preserving equivariance: where ϵ is introduced to maintain numerical stability, g ðlÞ c ,b P l m = Àl |ðv i Þ ðlÞ cm À μ ðlÞ m | 2 , N is the total number of vertices. Here only the E3LayerNorm for vertex update blocks is described. The corresponding E3LayerNorm for edge update blocks is similar with the mean and variance obtained from edge features instead of vertex features. We find that E3LayerNorm significantly stabilizes the training process. A discussion about the use of E3LayerNorm can be found in Supplementary Note 5.
The previously discussed blocks do not include coupling between different l's. This problem is resolved by the tensor product layer: where C l 1 m 1 l 2 m 2 ;l 3 m 3 are Clebsch-Gordan coefficients, U ðlÞ cc 0 ,V ðlÞ cc 0 are learnable weights. This is abbreviated as z = (Ux) ⊗ (Vy).
The neural network architecture is illustrated in Fig. 3. The equivariant convolution block (EquiConv, Fig. 3c) encodes the information of an edge and the vertices connected to that edge. The core component of equivariant convolution is the tensor product (Eq. (10)) of the vertex and edge features (v i ||v j ||e ij ) and the spherical harmonics of the edge ij (Yðr ij Þ). Here || stands for vector concatenation. The tensor product introduces directional information of material structure into the neural network. Propagating directional information into neural networks is important, as emphasized by previous works 12,14 , which is realized in an elegant way here via the tensor product. The interatomic distance information is also encoded into the neural network. It is expanded using the Gaussian basis expansion and then fed into a fully connected neural network, whose output is multiplied element-wise to the output of gate nonlinearity.
The vertex update block (Fig. 3d) aggregates information from the neighboring environment. To update a vertex, every edge connected to that vertex contributes a "message" generated by the equivariant convolution (EquiConv) block. All the "messages" are summed and normalized to update the vertex feature. This is similar for the edge update block (Fig. 3e), except that only the output of EquiConv on edge ij is used for updating e ij . After several updates, the final edge feature vectors will serve as the neural network output and are passed into the Wigner-Eckart layer to construct the Hamiltonian matrix blocks, as illustrated in Fig. 2. More details are described in "Methods".

Capability of DeepH-E3
The incorporation of global Euclidean symmetry as a priori knowledge provided to the message-passing deep-learning framework in the DeepH-E3 model has led to its outstanding performance in terms of efficiency and accuracy. A remarkable capability of DeepH-E3 is to learn from DFT data on small structures and make predictions on varying structures of different sizes without having to perform further DFT calculations. This enables highly efficient electronic structure calculations of large-scale material systems at ab initio accuracy. All the DFT Hamiltonian matrices used for deep learning in this work are computed by the OpenMX code using the PAO basis. After example studies on monolayer graphene and MoS 2 datasets, we will first demonstrate the capability of DeepH-E3 by investigating twisted bilayer graphene (TBG), especially the well-known magic-angle TBG whose DFT calculation is important but quite challenging due to its huge Moiré supercell. Next, we will apply DeepH-E3 to study twisted van der Waals (vdW) materials with strong SOC, including bilayers of bismuthene, Bi 2 Se 3 , and Bi 2 Te 3 , for demonstrating the effectiveness of our equivariant approach to construct the spin-orbital DFT Hamiltonian. Finally, we will use our model to illustrate the SOC-induced topological quantum phase transition in twisted bilayer Bi 2 Te 3 , giving an example of exploring exotic physical properties in large-scale material systems.

Study of monolayer graphene and MoS 2
Before going to large-scale materials, we first validate our method on the datasets used in ref. 25 to benchmark DeepH-E3's performance. The datasets are comprised of DFT supercell calculation results of monolayer graphene and MoS 2 , and different geometric configurations are sampled from ab initio molecular dynamics. The test results are summarized in Table 1 and compared with those of the original DeepH method 25 , which, instead of using an explicitly equivariant approach, applied the local coordinate technique in handling the covariant transformation property of the Hamiltonian. Our experiments show that the mean absolute errors (MAEs) of Hamiltonian matrix elements averaged over atom pairs are all within a fraction of a meV, which are reduced approximately by a factor of 2 or more in all prediction targets compared with DeepH. Benefiting from the high accuracy of the deep-learning DFT Hamiltonian, band structures predicted by DeepH-E3 can accurately reproduce DFT results (Supplementary Fig. 1).

Application to twisted vdW materials
Our deep learning method is particularly useful for studying the electronic structure of twisted vdW materials. This class of materials has attracted great interest for research and applications since their Moiré super periodicity offers a new degree of freedom to tune manybody interactions and brings in emergent quantum phenomena, such as correlated states 32 , unconventional superconductivity 33 , and higherorder band topology 34 . Traditionally, it is challenging to perform computationally demanding DFT calculations on large Moiré structures. However, this challenge could be largely overcome by DeepH-E3. One may train the neural network models by DFT data on small, nontwisted, randomly perturbed structures and predict the DFT Hamiltonian of arbitrarily twisted structures bypassing DFT via deep learning, as illustrated in Fig. 4a. This procedure demands much less computational resources than directly doing DFT calculations on large twisted superstructures. Once the model is trained, it can be applied to study TBGs of varying twist angles. The performance is compared with that of DeepH. Test data includes DFT results for systems containing up to more than one thousand atoms per supercell. As summarized in Fig. 4b, DeepH-E3 significantly reduces the averaged MAEs of DFT Hamiltonian matrix elements by more than a factor of 2 as compared to DeepH, consistent with the above conclusion. Moreover, the MAEs reach ultralow values of 0.2-0.3 meV and gradually decrease with increasing Moiré supercell size (or decreasing twist angle). This demonstrates the good generalizability of DeepH-E3. The method is thus expected to be suitable for studying TBGs with small twist angles that are of current interest 35 .
We take the magic-angle TBG with θ = 1.08 ∘ and 11,164 atoms per supercell as a special example. The discoveries of novel physics relevant to flat bands in this system have triggered enormous interest in investigating twisted vdW materials. Due to the large supercell, DFT study of magic-angle TBG is a formidable task, but DeepH-E3 can routinely study such kind of material systems in a particularly accurate and efficient way. As shown in Fig. 4c, the electronic bands of magicangle TBG with relaxed structure computed by DeepH-E3 agree well with the published results obtained by DFT and low-energy effective continuum model 35 . The flat bands near the Fermi level are well reproduced. Some minor discrepancies appear away from the Fermi level, which could be partially explained by the methodological difference: the benchmark work uses the plane-wave basis, whereas our work employs the atomic-like basis, and the pseudopotential used is also different. Detailed discussions about the influence of basis set and pseudopotential are included in Supplementary Note 2.
Most remarkably, DeepH-E3 has the capability to reduce the computational cost of studying these large material systems by several orders of magnitude. The DFT calculation (including structural relaxation) on magic-angle TBG performed in ref. 35 took around 1 month on about five thousand CPU cores. In contrast, the major computational cost of DeepH-E3 comes from neural network training. Typically, only a few hundreds of DFT training calculations are needed, and the training process usually takes tens of GPU hours, but all these are only required to be done once. After that, DFT Hamiltonian matrices can be constructed very efficiently via neural network inference. The process time is on the order of minutes by one GPU for magic-angle TBG, which grows linearly with Moiré supercell size. Generalized eigenvalue problems are solved for 60 bands near the Fermi level to obtain the band dispersion, which only requires about 8 min per k-point for magic-angle TBG using 64 CPU cores. The low computational cost and high accuracy of DeepH-E3 demonstrate its potential power in resolving the accuracy-efficiency dilemma of ab initio calculation methods, and it would be highly favorable to future scientific research.

Study of twisted vdW materials with strong SOC
We have tested the performance of DeepH-E3 on studying twisted vdW materials with strong SOC, including twisted bilayers of bismuthene, Bi 2 Se 3 , and Bi 2 Te 3 . The latter two materials are more complicated, which include two quintuple layers and two kinds of elements (Fig. 5a for Bi 2 Te 3 ). The strong SOC introduces additional complexity in their electronic structure problems. Despite all these difficulties, the capability of DeepH-E3 is not influenced to any extent. Our method reaches sub-meV accuracy in predicting DFT Hamiltonians of test material samples, including nontwisted and twisted structures of bismuthene, Bi 2 Se 3 , and Bi 2 Te 3 bilayers. Impressively, the band structures predicted by DeepH-E3 match well with those obtained from DFT ( Supplementary Fig. 3). Moreover, we observe the remarkable ability of our model to fit a tremendous amount of data with moderate model capacity and relatively small computational complexity. For instance, the neural network model is able to fit 2.8 × 10 9 nonzero complex- Non-twisted DFT dataset Here the DFT benchmark calculations were performed with a different code using a plane-wave basis instead of an atomic-like basis and different pseudopotential, which could introduce numerical differences with respect to DeepH-E3. Source data are provided with this paper.
valued Hamiltonian matrix elements in the dataset with about 10 5 real parameters. The training time is about one day on a single GPU in order to reach sub-meV accuracy. More details are presented in Supplementary Note 3. Through these experiments, the capability of DeepH-E3 to represent the spin-orbital DFT Hamiltonian is well demonstrated.
In physics, the SOC can induce many exotic quantum phenomena, leading to emergent research fields of spintronics, unconventional superconductivity, topological states of matter, etc. Investigation of SOC effects is thus of fundamental importance to the research of condensed matter physics and materials science. The functionality of analyzing SOC effects is easily implemented by DeepH-E3. Specifically, we apply two neural network models to learn DFT Hamiltonians with full SOC (Ĥ 1 ) and without SOC (Ĥ 0 ) separately for the same material system. Then, we define a virtual Hamiltonian as a function of SOC strength (λ):Ĥ λ =Ĥ 0 + λĤ SOC , whereĤ SOC =Ĥ 1 ÀĤ 0 . By studying the virtual Hamiltonian at different λ, we can systematically analyze the influence of SOC effects on material properties.
As an example application, we employ the approach to investigate the topological properties of twisted bilayer Bi 2 Te 3 . DeepH-E3 can accurately predict the DFT Hamiltonian for both cases with or without SOC, as confirmed by band structure calculations using the predictedĤ DFT (Fig. 5b, c). Herein the SOC is extremely strong as caused by the heavy elements in the material. Consequently, the band structure changes considerably when SOC is turned on. The evolution of band structure as a function of SOC strength (Fig. 5d) provides rich information on the SOC effects. Importantly, the band gap closes and reopens when increasing the SOC strength, indicating a topological quantum phase transition from Z 2 = 0 to Z 2 = 1. This is further confirmed by applying symmetry indicators based on Kohn-Sham orbital analysis and by performing Brillouin-zone integration of Berry connection and curvature over all occupied states via the Fukui-Hatsugai-Suzuki formalism 36 . The topological invariant Z 2 turns out to be nonzero for the spin-orbital coupled system, suggesting that the twisted bilayer Bi 2 Te 3 (θ = 21.8 ∘ ) is topologically nontrivial. As DeepH-E3 works well for varying twist angles, the dependence of band topology on twist angle can be systematically computed, which will enrich the research of twisted vdW materials.

Discussion
Since the DFT HamiltonianĤ DFT transforms covariantly between reference frames, it is natural and advantageous to construct the mapping from crystal structure fRg toĤ DFT in an explicitly equivariant manner. In this context, we have developed a general framework to representĤ DFT with a deep neural network DeepH-E3 that fully respects the principle of covariance even in the presence of SOC. We have presented the theoretical basis, code implementation, and practical applications of DeepH-E3. The method enables accurate and efficient electronic structure calculation of large-scale material systems beyond the scope of traditional ab initio approaches, opening possibilities to investigate rich physics and novel material properties at a particularly low computational cost.
However, as the structure becomes larger, it becomes increasingly difficult to diagonalize the Hamiltonian matrix in order to obtain wavefunction-related physical quantities. This difficulty, instead of the limitations of DeepH-E3 method itself, will eventually become the bottleneck of accurate electronic structure predictions. Nevertheless, benefiting from the sparseness of the DFT Hamiltonian matrix under localized atomic orbital basis, many efficient O(N) algorithms with high parallel efficiency are available for studying large-scale systems (e.g., supercells including up to 10 7 atoms 37 ). A combination of the DeepH-E3 method with such efficient linear algebra algorithms will be a promising direction for future study. The unique abilities of DeepH-E3, together with the general framework of incorporating symmetry requirements and physical insights into neural network model design, might find wide applications in various directions. For example, the method can be applied to build a material database for a diverse family of Moiré-twisted materials. For each kind of material, only one trained neural network model will be needed for all the twisted structures in order to have full access to their electronic properties, which is a great advantage for high throughput material discovery. Moreover, since the deep-learning method does not rely on periodic boundary conditions, 2D materials with incommensurate twist angles can also be investigated, making the ab initio study of quasi-crystal phases possible. In addition, we could go one step further by calculating the derivative of the electronic Hamiltonian with respect to atomic positions via automatic differentiation techniques. This enables deep-learning investigation of the physics of electron-phonon coupling in large-scale materials, which has the potential to outperform the computationally expensive traditional methods of frozen phonon or density functional perturbation theory 38 . Furthermore, one may combine the deep learning method with advanced methods beyond the DFT level, such as hybrid functionals, many-body perturbation theory, time-dependent DFT, etc. These important generalizations, if any of them are realized, would greatly enlarge the research scope of ab initio calculation.

Datasets
Data generated in this study is available in public repositories at Zenodo [39][40][41] .
Monolayer graphene: The dataset is taken from ref. 25. The dataset consists of 450 graphene structures with 6 × 6 supercells, generated by ab initio molecular dynamics performed by the Vienna ab initio simulation package (VASP) 42 , using the PBE 43 exchangecorrelation functional and the projector-augmented wave (PAW) pseudopotentials 44,45 . The cutoff energy of the plane waves is 450 eV, and only the Γ point is used in our k-mesh. Five thousand frames are obtained at 300K with time step 1fs, and then one frame is taken out every 10 frames starting from the 500th frame. Thus, there are 450 structures in the dataset. The Hamiltonians for training are calculated with the OpenMX code using the PBE functional and normconserving pseudopotential with C6.0-s2p2d1 PAOs with 5 × 5Γ-centered k-sampling. Here 6.0 denotes the orbital cutoff radius in Bohr, s2p2d1 means there are 2 × 1 = 2 s-orbitals, 2 × 3 = 6 p-orbitals, and 1 × 5 = 5 d-orbitals.
Monolayer MoS 2 : The dataset is also taken from ref. 25. Five hundred structures with 5 × 5 supercells are generated by ab initio molecular dynamics performed by VASP with PAW pseudopotential and PBE functional. The cutoff energy of the plane waves is 450 eV, and only the Γ point is used in our k-mesh. One thousand frames are taken at 300K with time step 1fs. The first 500 unequilibrated structures are discarded, and the remaining 500 structures are taken into the dataset. The Hamiltonians for training are calculated with the OpenMX code using the PBE functional and norm-conserving pseudopotential with Mo7.0-s3p2d2 and S7.0-s2p2d1 PAOs with 5 × 5Γ-centered k-sampling.
Bilayer graphene: The dataset is also taken from ref. 25. Three hundred structures with 4 × 4 nontwisted supercells are generated by a uniform shift of one of the two vdW layers and inserting random perturbations to atomic positions in the mean time. The perturbations are within 0.1 Å along three cartesian directions. The supercells are constructed from bilayer unit cell structures relaxed with VASP 42 using PBE functional with vdW interaction corrected by DFT-D3 method with Becke-Jonson damping 46 . The optimal interlayer spacing is found to be 3.35 Å. The Hamiltonians of the dataset and twisted structures are all calculated with the OpenMX code using the PBE functional and norm-conserving pseudopotential with C6.0-s2p2d1 PAOs.
Bilayer bismuthene, Bi 2 Se 3 , and Bi 2 Te 3 : The same procedure is used to generate nontwisted 3 × 3 bilayer supercells. The numbers of structures are 576, 576, and 256 for bismuthene, Bi 2 Se 3 , and Bi 2 Te 3 , respectively, but only a randomly selected subset is used for training (details can be found in Supplementary Note 4). The interlayer spacing is 3.20 Å, 2.50 Å, and 2.61 Å for bismuthene, Bi 2 Se 3 , and Bi 2 Te 3 , respectively. The interlayer spacing is defined to be the vertical distance between the lowest atom in the upper layer and the highest atom in the lower layer. The Hamiltonians of the dataset and twisted structures are all calculated with the OpenMX code using the PBE functional and norm-conserving pseudopotential with Bi8.0-s3p2d2, Se7.0-s3p2d1 and Te7.0-s3p2d2 PAOs.

Details of neural network models
All the neural network models presented in this article are trained by directly minimizing the mean-squared errors of the model output compared to the Hamiltonian matrices computed by DFT packages, and the reported MAEs are also obtained from comparing model output to the DFT results. All physical quantities of materials are derived from the output Hamiltonian matrix.
Some details of neural network building blocks are described here. The Gaussian basis is adapted from ref. 4, which is defined as: e B ð|r ij |Þ n = exp À |r ij | À r n 2 where r n , n = 0, 1, … are evenly spaced, with intervals equal to Δ. The E3Linear layer is defined as: Here ϕ 1 and ϕ 2 are activation functions. In this work, we use ϕ 1 =SiLU and ϕ 2 =Sigmoid following ref. 7.
The ENN is implemented with the e3nn library 31 in version 0.3.5 and PyTorch 47 in version 1.9.0. The Gaussian basis expansion used as input to the EquiConv layer has a length of 128. The fully connected neural network in the EquiConv layer is composed of two hidden layers, each with 64 hidden neurons, using the SiLU function as nonlinear activation and a linear layer as output. A description of neural network hyperparameters for each material system and their selection strategy can be found in Supplementary Note 4.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The datasets for monolayer graphene, monolayer MoS 2 , bilayer graphene, and bilayer bismuthene are available in ref. 39. Dataset for bilayer Bi 2 Se 3 is available in ref. 40. Dataset for bilayer Bi 2 Te 3 is available in ref. 41. Instructions on reproducing the DeepH-E3 models on these datasets can also be found in the corresponding repositories. Source data are provided with this paper.